Burrows-Wheeler transformations and de Bruijn words

نویسنده

  • Peter M. Higgins
چکیده

We formulate and explain the extended Burrows–Wheeler transform ofMantaci et al. from the viewpoint of permutations on a chain taken as a union of partial order-preserving mappings. In so doing we establish a link with syntactic semigroups of languages that are themselves cyclic semigroups.We apply the extended transformwith a view to generating de Bruijn words through inverting the transform. We also make use of de Bruijn words to facilitate a proof that themaximum number of distinct factors of a word of length n has the form 2n 2 − O(n log n). © 2012 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Variation Graphs

Variation graphs, which represent genetic variation within a population, are replacing sequences as reference genomes. Path indexes are one of the most important tools for working with variation graphs. They generalize text indexes to graphs, allowing one to find the paths matching the query string. We propose using de Bruijn graphs as path indexes, compressing them by merging redundant subgrap...

متن کامل

Wheeler Graphs: Variations on a Theme by Burrows and Wheeler

The famous Burrows-Wheeler Transform was originally defined for single strings but variations have been developed for sets of strings, labelled trees, de Bruijn graphs, alignments, etc. In this talk we propose a unifying view that includes many of these variations and that we hope will simplify the search for more. Somewhat surprisingly we get our unifying view by considering the Nondeterminist...

متن کامل

deBWT: parallel construction of Burrows–Wheeler Transform for large collection of genomes with de Bruijn-branch encoding

MOTIVATION With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows-Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for larg...

متن کامل

Wheeler graphs: A framework for BWT-based data structures☆

The famous Burrows-Wheeler Transform (BWT) was originally defined for a single string but variations have been developed for sets of strings, labeled trees, de Bruijn graphs, etc. In this paper we propose a framework that includes many of these variations and that we hope will simplify the search for more. We first define Wheeler graphs and show they have a property we call path coherence. We s...

متن کامل

Enhanced Word-Based Block-Sorting Text Compression

The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 457  شماره 

صفحات  -

تاریخ انتشار 2012